Projects and Content


Article: Towards Data Science: “Machine Learning’s Public Perception Problem”

In this piece, I take a close look at why the public doesn’t have an accurate understanding of machine learning/AI, and explain what the negative results will be if data science doesn’t work to fix this problem.

Keywords: public policy, education, machine learning


Article: Towards Data Science: “Archetypes of the Data Science Role”

A discussion of the different forms a data science job can take, including practical explanation of what the responsibilities look like. I also discuss the overloading of individual roles and how that can cause burnout.

Keywords: data science, careers


Article: Towards Data Science: “Machine Learning Engineers: What do they actually do?”

My thoughts on the evolution of job titles and career paths in the DS/ML field, including some discussion of the possible DEI implications.

Keywords: dei, machine learning


Series: project44 Tech Blog: “Setting Healthy Boundaries: Generating Geofences at Scale with Machine Learning”

My step by step description of how I designed and implemented the Smart Geofences feature at project44.

Keywords: gis, geofences, machine learning, devops, airflow, python


Article: Towards Data Science: “Starting a New Machine Learning Model”

In this article I described my process for starting a new ML project, and gave tips from my experience for newer data scientists.

Keywords: machine learning, tutorial, advice


Podcast: Appearance on “Beautiful Bastards Podcast”

The hosts of this show gave me an hour to talk about all kinds of issues in DS/ML, including ethics, data privacy, risk/rewards, D&I, and more. I mentioned a couple of things in the episode that are linked below.

Keywords: ethics, machine learning


Series: Cloud Based Machine Learning Workflows

In this series, I took a close look at these 5 areas of the data science or machine learning development workflow, telling you what the cloud approach looks like versus the local approach, explaining the pros and cons of each, and describing a few tools you might want to try.

Keywords: python, machine learning, cloud computing


Tutorial: Deploying ML Models - Flask or Voila, API or Web App!

Data science model deployment can sound intimidating if you have never had a chance to try it in a safe space. In this blog post, I give the end-to-end explanation of how you go from zero to a deployment that you can share with others.

Keywords: python, HTML, machine learning, deployment, API


Article: List Comprehensions in Python

List comprehensions are incredibly powerful, but also can be unintuitive and confusing. I wrote a guide to understanding and using them, designed for those who use for-loops in their regular programming.

Keywords: python, programming, beginners


Tutorial: Parallel GAN Training on GPU: Generate Your Own Images

Generative Adversarial Networks (GANs) are an increasingly popular and very powerful form of computer vision deep learning, and they require a whole lot of compute to be done effectively and speedily. This is just the sort of use case where parallelization with Dask clusters can make a difference in your workflow, so I wrote a tutorial on how to do it!

Keywords: python, machine learning, deep learning, dask


Article: Beyond Matplotlib and Seaborn: Python Data Visualization Tools That Work

I wrote a blogpost to accompany several talks and a github repo all about using different Python visualization tools. Get the code at https://github.com/skirmer/new-py-dataviz.

Keywords: python, data visualization


Article: Dask and pandas: There’s No Such Thing as Too Much Data

Do you love pandas, but hate when you reach the limits of your memory or compute resources? Dask gives you the chance to use the pandas API with distributed data and computing. In this article, you’ll learn how it really works, how to use it yourself, and when/if to switch.

Keywords: python, pandas, dask


Article: Eager Data Scientist’s Guide to Lazy Evaluation with Dask

Lazy evaluation is the core of parallelization, but it doesn’t have to be confusing or complicated — in this guide, learn the basic concepts you need to get started! I use dask to demonstrate but this is useful for anyone trying to get the hang of parallel computation. Keywords: python, parallelization, dask


Opinion: Your Data Scientist Does Not Need a STEM Ph.D.

You should not require or ask for a generic STEM Ph.D. for data scientist candidates. In this blog post, I give a detailed argument on this and discuss the critiques of the practice.

Keywords: data science, social science


Tutorial: Combining Dask and PyTorch for Better, Faster Transfer Learning

I use the Stanford Dogs dataset again, this time to demonstrate accelerating transfer learning to improve Resnet50. The inner workings of how PyTorch supports multi-machine, multi-GPU training can be confusing but I have deciphered it for you here.

Keywords: machine learning, python, deep learning


Tutorial: Computer Vision at Scale with Dask and PyTorch

I use the Stanford Dogs dataset to demonstrate accelerating an image classification problem with GPU Clusters. If you have been thinking about GPUs but don’t know where to start, or what they might be good for, I recommend this as a place to start!

Keywords: machine learning, python, deep learning


Article: 3 Ways to Schedule and Execute Python Jobs

Job scheduling is what takes academic machine learning to production level for real business or project value. I went through three tools (cron, Airflow, and Prefect) in this article and discussed the pros and cons to each. Depending on your task and circumstances, any one of these tools might be what you need.

Keywords: python, job scheduling, airflow


Article: Make Your Data Move: Using Schedulers with Data Storage to Generate Business Value

In concert with my presentation at ODSC Europe 2020 I wrote a blog post to discuss why and how you might use scheduled jobs to make your data infrastructure work better for modeling and machine learning.

Keywords: job scheduling, airflow, data warehousing


See more projects




kaggle | github | linkedin | youtube